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A  two-stage  study  was  conducted  to  compare  the  ability  estimates 
yielded  by  tailored  testing  procedures  based  on  the  one-parameter 
logistic  (1PL)  and  three-parameter  logistic  (3PL)  models.  The  first 
stage  of  the  study  employed  real  data,  while  the  second  stage  employed 
simulated  data.  In  the  first  stage,  response  data  for  3000  examinees 
were  obtained  for  the  40  item  ACT  Assessment  Mathematics  Usage  subtest. 

The  first  2000  cases  were  used  to  obtain  item  parameter  estimates  for 
both  models.  Using  these  estimates,  1PL  and  3PL  tailored  tests  were 
simulated  using  the  response  data  for  the  remaining  1000  cases.  Both 
tailored  testing  procedures  employed  maximum  likelihood  ability  estimation 
and  maximum  information  item  selection  procedures.  The  two  sets  of  ability 
estimates  were  then  compared.  In  the  second  stage,  response  data  for 
3000  cases  were  simulated  using  the  3PL  item  parameter  estimates  from 
the  first  stage  as  true  parameters.  True  abilities  were  selected  from 
the  standard  normal  distribution.  The  first  2000  cases  were  used  for 
1PL  and  3PL  calibration  of  the  items,  and  the  remaining  1000  cases 
were  used  to  simulate  1PL  and  3PL  tailored  tests.  The  two  sets  of 
ability  estimates  were  compared  to  each  other  and  to  the  true  ability 
parameters.  Results  of  both  stages  of  the  study  indicated  that  the 
1PL  and  3PL  tailored  tests  yielded  highly  correlated  ability  estimates, 
and  there  was  no  apparent  advantage  in  terms  of  ability  estimation  to 
using  one  of  the  models  over  the  other.  Because  the  1PL  procedure  was 
less  expensive  to  use,  it  was  the  recommended  model  for  this  application. 


Aooeaslon  For 

"irris  gra&i  rjr 

DTIC  TAB 

Unannounced  r-i 

Justificati  ,  • _ 

By - 

-Distribution/ 

_  Availability  Codes 
|Avail  and/or 
Hst  Special 


*3 


CONTENTS 


Introduction . 

Comparison  of  1PL  and  3PL  Tailored  Testing  Procedures 

Method . . . 

Models  . 

Estimation  Programs . 

Tailored  Testing  Procedures . 

Design  . 

Data . 

Analyses  . 

Results  . 

Real  Data  Analyses  . 

Item  Pool  Calibration  . 

Ability  Estimates  . 

Average  Test  Length  . 

Nonconvergence . 

Simulation  Data  Analyses  . 

Item  Pool  Calibration  . 

Ability  Estimates  . 

Average  Test  Length  . 

Nonconvergence . 


Discussion . 

The  Application . 

Real  Data  Analyses  .... 
Item  Pool  Calibration 
Ability  estimates  .  . 
Average  Test  Length  . 
Nonconvergence.  .  .  . 
Simulation  Data  Analyses  . 
Item  Pool  Calibration 
Ability  Estimates  .  . 
Average  Test  Length  . 
Nonconvergence.  .  .  . 

Summary  and  Conclusions  .  .  .  . 


References 


In  a  second  study,  reported  by  Koch  and  Reckase  (1979), 
1PL  and  3PL  tailored  testing  procedures  were  applied  to  a 
multidimensional  achievement  test.  Results  of  this  study 
indicated  very  poor  performance  for  both  procedures, 
primarily  due  to  small  sample  sizes,  poor  linking 
procedures,  and  poor  selection  of  the  stepsize  and  initial 
ability  estimates  for  the  maximum  likelihood  estimation 
procedure . 

A  study  reported  by  McKinley  and  Reckase  (1980)  attempted 
to  correct  the  problems  encountered  in  the  Koch  and  Reckase 
studies.  Close  attention  was  paid  to  appropriate  item 
parameter  linking  and  selection  of  the  operating 
characteristics  of  the  procedures.  The  results  of  this 
study  indicated  that  both  models  could  be  quite  successfully 
applied  to  tailored  achievement  testing  if  correctly 
implemented.  Both  1PL  and  3PL  reliabilities  were  higher 
than  the  reliability  of  a  classroom  test  over  the  same 
material.  The  3PL  procedure  yielded  better  fit  to  the  data 
than  the  1PL  procedure,  and  it  also  yielded  higher  test 
information  than  the  1PL  procedure.  This  study  concluded 
that  for  tailored  achievement  testing  the  3PL  model  was  the 
model  of  choice.  However,  the  test  used  in  this  study  was 
highly  multidimensional.  It  is  unclear  how  generalizable 
the  results  are  to  less  multidimensional  achievement  test. 

Urry  (1970,  1977)  also  concluded  that  the  3PL  model  was 
the  model  of  choice.  Through  a  series  of  simulation  studies 
Urry  found  that  tailored  testing  becomes  less  effective  when 
a  model  with  an  insufficient  number  of  parameters  is  used. 

He  concluded  that  construct  valdity  decreases  as  a  function 
of  the  degree  of  degeneracy  of  the  model,  and  the  1PL  model 
was  particularly  inappropriate  for  use  with  multiple-choice 
items  because  it  did  not  portray  multiple-choice  response 
data  with  fidelity  (Urry,  1977). 

This  review  of  previous  research  indicates  that  if 
careful  attention  is  paid  to  all  components  of  the  tailored 
testing  procedure,  both  1PL  and  3PL  tailored  testing  can  be 
successful.  The  3PL  model  tends  to  yield  higher 
reliabilities  and  test  information  than  the  1PL  procedure, 
but  is  more  prone  to  complications  such  as  nonconvergence. 

It  is  also  indicated  that  the  3PL  model  yields  better  fit  to 
multidimensional  data.  Thus,  the  results  of  these  studies 
tend  to  favor  the  3PL  model.  Of  course,  these  results  were 
obtained  using  relatively  large  item  pools.  It  is  unclear 
from  these  studies  what  results  would  be  obtained  using 
smaller  item  pools.  The  purpose  of  this  study  was  to 
compare  the  1PL  and  3PL  models  in  a  tailored  achievement 
testing  application  for  which  a  relatively  small  item  pool 
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is  available. 


Method 


Models 

The  two  models  selected  for  this  study  were  the  one- 
parameter  logistic  (1PL)  and  the  three-parameter  logistic 
(3PL)  models.  The  1PL  model  is  given  by 


p(xi j )  = 


exp( (0j-bi)xij ) 


l+exp( 0  ^ -bi ) 


where  6^  is  the  ability  parameter  for  examinee  j,  b^  is  the 
difficulty  parameter  for  item  i,  x^  is  the  observed  score 
(0  or  1)  on  item  i  for  examinee  j,  and  P(xi^)  is  the 
probability  of  response  x^  to  item  i  by  examinee  j.  The 
3PL  model  is  given  by 


P(xij=1)  =  ci  +  (1“ci)' 


exp(DaA ( 0 j -bi ) ) 


l+exp(Da^ (0j-b^ ) ) 

where  is  the  pseudo-guessing  parameter  for  item  i,  ai  is 

the  discrimination  parameter  for  item  i,  where  P^B^)  is  the 

probability  of  a  correct  response  to  item  i  by  examinee  j, 
and  the  remaining  terms  are  as  previously  defined. 

Estimation  Programs 

For  both  the  1PL  and  the  3PL  models  parameters  were 
estimated  using  the  LOGIST  program  (Wingersky,  Barton,  and 
Lord,  1982).  For  the  1PL  model  the  pseudo-guessing 
parameter  was  held  fixed  at  0.0.  The  discrimination 
parameter  was  held  fixed  at  a  value  computed  by  the  LOGIST 
program.  To  check  the  1PL  estimates  obtained  from  LOGIST, 
they  were  compared  to  parameter  estimates  obtained  for  the 
same  data  using  the  MAX  program  (Wright  and  Panchapakesan, 
1969),  which  was  designed  for  use  with  the  1PL  model.  Since 
the  results  obtained  from  the  two  programs  were  almost 
identical,  LOGIST  was  used  throughout  the  study.  The  LOGIST 
program  was  used  for  both  models  in  order  to  avoid  problems 
due  to  different  parameter  estimate  scales.  For  both  models 
the  scales  were  based  on  the  ability  estimate  distributions. 

Tailored  Testing  Procedures 


Tailored  testing  procedures  have  three  main  components; 


and  a  stopping  rule.  In  this  study  both  the  1PL  and  3PL 
procedures  selected  items  to  maximize  the  value  of  the 
information  function  (Birnbaum,  1968)  at  the  most  recent 
ability  estimate.  The  information  for  each  item  at  the 
examinee's  current  ability  estimate  was  computed,  and  the 
item  with  the  greatest  information  at  that  ability  estimate 
was  administered,  with  the  provision  that  the  information 
had  to  be  greater  than  0.226  for  the  1PL  procedure  and  0.450 
for  the  3PL  procedure.  These  values  were  selected  on  the 
basis  of  several  trial  runs.  They  were  selected  so  as  to 
yield  approximately  equal  average  test  lengths  for  the  two 
models.  For  both  procedures  20  items  was  the  maximum  test 
length  allowed. 

Prior  to  testing  initial  estimates  of  ability  were 
assigned  to  set  the  starting  points  in  the  item  pool.  The 
initial  ability  estimates  for  this  .study  were  set  to  be 
0.221  for  the  1PL  procedure  and  0.420  for  the  3PL  procedure. 
These  values  represent  difficulty  values  near  the  medians  of 
the  item  pool  difficulty  parameter  distributions.  The  first 
item  was  then  selected  to  maximize  information  at  the 
initial  ability  estimate.  The  response  of  the  examinee  to 
that  item  was  then  simulated  in  the  following  manner.  For 
the  first  part  of  the  study,  response  data  came  from  a  fixed 
length,  non-tailored  test  comprised  of  all  the  items  in  the 
pool.  These  items  had  been  administered  in  paper  and  pencil 
form  to  all  of  the  examinees  used  in  this  study.  An 
examinee's  response  to  an  item  in  the  tailored  tests  was  the 
actual  response  of  the  examinee  to  the  item  on  the  paper  and 
pencil  test.  For  the  second  part  of  the  study,  simulated 
response  data  were  generated  for  each  examinee  for  each  item 
in  the  pool.  These  data  were  generated  according  to  the  3PL 
model  using  the  3PL  item  parameter  estimates  obtained  for 
the  real  response  data  and  examinee  abilities  selected  at 
random  from  a  standard  normal  distribution.  These  responses 
were  used  regardless  of  whether  a  1PL  or  3PL  based  tailored 
test  were  used. 

Once  the  response  by  an  examinee  to  an  item  had  been 
obtained,  a  new  estimate  of  ability  was  computed  by  adding  a 
fixed  stepsize  to  the  old  ability  estimate  if  the  response 
were  correct,  and  by  subtracting  a  fixed  stepsize  if  the 
response  were  incorrect.  This  fixed  stepsize  procedure  was 
used  until  a  maximum  likelihood  ability  estimate  could  be 
obtained  (i.e.,  when  both  correct  and  incorrect  responses 
were  obtained).  The  stepsize  used  was  0.300  for  both 
procedures.  Each  new  item  was  selected  to  maximize  the 
information  at  the  new  ability  estimate,  with  the 
restriction  that  no  item  could  be  used  more  than  once. 
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Two  stopping  rules  were  used  for  the  tailored  testing 
procedures.  The  tests  were  terminated  when  there  were  no 
items  left  in  the  item  pool  with  information  at  the  current 
ability  estimate  greater  than  the  minimum  specified  above, 
or  when  20  items  had  been  administered. 

Design 

This  study  employed  a  two-stage  design--one  involving  the 
use  of  real  data,  and  one  involving  simulated  data.  In  the 
first  stage  of  the  study,  response  data  were  obtained  for  a 
large  sample  on  a  relatively  short  paper  and  pencil  test. 
Part  of  the  large  sample  was  then  used  to  calibrate  the 
items  on  the  test  using  both  the  1PL  and  3PL  models.  Using 
the  resulting  item  parameter  estimates,  1PL  and  3PL  tailored 
tests  were  simulated  for  the  examinees  not  included  in  the 
calibration  sample.  The  responses  by  the  examinees  to  the 
items  in  the  tailored  tests  were  the  same  responses  they 
made  to  the  items  when  taking  the  paper  and  pencil  test. 

In  the  second  stage  of  the  study,  the  item  parameter 
estimates  obtained  from  the  3PL  calibration  of  the  paper  and 
pencil  test  were  used  as  true  parameters,  along  with  the 
true  abilities  selected  at  random  from  the  standard  normal 
distribution,  to  generate  simulated  response  data  to  fit  the 
3PL  model.  Data  were  generated  for  a  large  sample  for  all 
the  items  from  the  paper  and  pencil  test.  The  procedure 
used  for  the  real  data  part  of  the  study  was  then  repeated 
using  these  simulated  data. 

Data 

For  the  real  data  part  of  the  study,  response  data  for 
the  40  item  Mathematics  Usage  subtest  of  the  ACT  Assessment 
(The  American  College  Testing  Program,  1982)  were  obtained 
for  3000  cases  from  the  October,  1982  administration  of  the 
ACT  Assessment  (Form  23B) .  For  the  second  stage  of  the 
study,  data  were  simulated  for  40  items  and  3000  cases.  For 
both  stages,  then,  rather  small  item  pools  were  used. 

Analyses 

The  analyses  performed  in  this  study  consisted  primarily 
of  computing  and  comparing  correlations.  For  both  the  real 
and  the  simulation  data,  the  1PL  and  3PL  tailored  test 
ability  estimates  were  compared  by  computing  the  correlation 
between  them.  For  the  simulation  data  the  two  sets  of 
ability  estimates  obtained  from  the  tailored  tests  were  also 
compared  to  the  true  abilities  used  to  generate  the  data. 
Again,  the  comparisons  were  performed  using  correlations. 


Results 


Real  Data  Analyses 

Item  Pool  Calibration  The  first  analysis  performed  on  the 
real  data  was  the  calibration  of  the  items  for  use  as  a 
tailored  testing  item  pool.  The  calibration  of  the  items, 
which  was  based  on  response  data  for  the  first  2000 
examinees,  was  performed  three  different  ways.  The  first 
two  calibrations  were  performed  for  the  1PL  model  using  the 
LOGIST  and  MAX  programs  while  the  third  was  performed  for 
the  3PL  model  using  LOGIST.  The  MAX  and  LOGIST  1PL  item 
difficulty  parameter  estimates  had  a  correlation  of  0.999, 
as  did  the  ability  estimates  obtained  from  the  two  programs. 
This  comparison  was  performed  in  order  to  determine  whether 
the  LOGIST  program  could  be  used  for  both  models  throughout 
the  study.  These  findings  indicated  that  it  could,  thus 
simplifying  the  problem  of  placing  the  estimates  from  the 
two  models  on  the  same  scale. 

The  item  parameter  estimate  distributions  obtained  for 
the  two  models  using  LOGIST  are  shown  in  Figure  1.  These 
distributions  are  summarized  by  the  statistics  shown  in 
Table  1.  As  can  be  seen,  most  of  the  3PL  discrimination 
parameter  estimates  were  .60  or  higher,  so  most  of  the  items 
were  of  fairly  high  quality.  From  the  3PL  difficulty 
parameter  estimate  distribution,  however,  it  can  be  seen 
that  the  items  are  appropriate  only  for  a  limited  range  of 
ability,  since  most  of  the  item  difficulty  estimates  fall  in 
the  range  from  -1.0  to  1.75.  Most  of  guessing  parameter 
estimates  are  .3  or  less,  with  only  two  items  having 
guessing  parameter  estimates  greater  than  .3.  From  these 
data  it  would  appear  that  these  items  actually  form  a  fairly 
high  quality  item  pool  for  tailored  testing,  except  for  the 
limitation  on  the  range  of  difficulty. 

For  the  1PL  model,  the  LOGIST  program  assigned  to  all 
items  a  discrimination  value  of  0.561.  The  pseudo-guessing 
parameter  was,  of  course,  0.0.  The  1PL  difficulty  parameter 
estimate  distribution  is  somewhat  different  from  the  3PL 
difficulty  distribution  although  the  two  sets  of  estimates 
had  a  correlation  of  .88,  with  the  biggest  difference  being 
a  shift  downward  of  the  bulk  of  the  estimates  for  the  1PL 
model.  Most  of  the  difficulty  parameter  estimates  fall 
within  the  same  range  as  for  the  3PL  model,  but  there 
appears  to  be  a  shift  toward  the  negative  end  of  that  range. 
Still,  for  that  range  the  items  form  an  item  pool  of  fairly 
high  quality. 
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Descriptive  Statistics  of  Item  Parameter 
Estimates  for  the  Real  Data 


1PL 

3PL 

u^aci.  S  LlC 

b 

a 

b 

c 

Mean 

0.03 

0.98 

0.46 

0.17 

Median 

0.22 

0.90 

0.41 

0.16 

S.D. 

0.91 

0.34 

1.10 

0.08 

Skewness 

-0.24 

0.40 

-0.20 

1.14 

Kurtosis 

0.19 

-0.04 

0.99 

1.19 

Low  Value 

-2.07 

0.31 

-2 . 12 

0.08 

High  Value 

2.04 

1.81 

3 . 15 

0.41 

Figure  2  shows  the  test  information  function  for  the  item 
pool  based  on  the  1PL  item  parameter  estimates,  while  Figure 
3  shows  the  test  information  function  based  on  the  3PL 
estimates.  As  can  be  seen  from  Figure  3,  the  3PL  curve  is 
negatively  skewed,  and  is  centered  around  1.0,  thus  yielding 
more  information  for  the  positive  end  of  the  ability  scale. 
The  1PL  curve,  on  the  other  hand,  is  not  skewed,  and  is 
centered  around  0.2.  It  would  appear  from  this,  then,  that 
the  1PL  item  parameter  estimates  are  appropriate  for  a  wider 
range  for  ability  than  the  3PL  estimates  are.  Of  course, 
the  ability  scales  are  not  exactly  comparable  because  they 
are  based  on  different  item  parameters. 

Ability  Estimates  For  thoso  examinees  not  included  in  the 
calibration  sample,  four  different  estimates  of  ability  were 
computed.  For  each  examinee  a  1PL  and  3PL  ability  estimate 
was  obtained  from  simulated  tailored  test.  In  addition, 
ability  estimates  for  each  examinee  for  both  models  were 
obtained  from  LOGIST  using  the  item  parameter  estimates  and 
the  examinee  responses  from  the  40  item  paper  and  pencil 
test.  This  made  possible  not  only  a  comparison  of  the  two 
tailored  testing  procedures,  but  also  a  comparison  of  the 
tailored  testing  procedures  with  the  paper  and  pencil  tests. 

Table  2  summarizes  the  distributions  of  the  ability 
estimates  obtained  for  both  models  from  the  tailored  tests 
and  from  the  paper  and  pencil  tests.  Table  3  shows  the 
intercorrelation  matrix  for  these  four  sets  of  ability 
estimates.  As  can  be  seen  from  these  data,  the  two  sets  of 
tailored  test  ability  estimates  were  similar,  with  a 
correlation  of  0.77.  However,  there  were  some  differences 
in  the  two  distributions. 
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Figure  2 


The  Test  Information  Function  for  the  1PL 
Item  Parameter  Estimates  for  the  Real  Data 
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Figure  3 

The  Test  Information  Function  for  the  3PL 
Item  Parameter  Estimates  for  the  Real  Data 


For  instance,  the  skewness  value  of  -0.97  for  3PL  ability 
estimate  distribution  was  significantly  different  from  zero 
(with  a  sample  size  of  1000,  the  standard  error  for  the 
skewness  coefficient  is  0.08),  while  the  1PL  ability 
estimate  distribution  was  not  significantly  skewed.  Also, 
the  kurtosis  value  of  1.96  for  the  3PL  ability  estimate 
distribution  was  significant  (standard  error  =  0.16),  while 
the  kurtosis  value  of  the  1PL  ability  estimate  distribution 
was  not  significant. 

Table  2 

Descriptive  Statistics  of  Ability  Parameter 
Estimates  for  the  Real  Data 


M 

Median 
S.D. 

Skewness 

Kurtosis 

Low  Value 

High  Value 

Mean  Test  Length 

S.D.  of  Test  Length 


0.15 

0.01 

0.21 

0.11 

0.14 

0.23 

0.16 

0.25 

1.36 

1.40 

1.13 

1.18 

0.10 

-0.97 

0.74 

-0.35 

0.21 

1.96 

3.48 

4.39 

-3.65 

-4.00 

-2.92 

-4.00 

6.22 

6.42 

4.00 

4.00 

12.84 

12.16 

40.00 

40.00 

4.51 

4.73 

0.00 

0.00 

Note .  For  the  LOGIST  calibrations  arbitrary  minimums  and 
maximums  of  -4.00  and  4.00  were  set  on  the  ability 
estimates.  The  same  limits  were  placed  on  the  tailored 
tests  except  in  those  cases  where  all  items  were  answered 
correctly  or  all  were  answered  incorrectly. 


Table  3 

Intercorrelation  Matrix  for  Ability  Parameter 
Estimates  for  the  Real  Data 


ALi lity 

Tailored  Tests 

Paper  and  Pencil 

Tests 

Estimate 

1PL  3PL 

1PL 

3PL 

Tailored 


Paper/Penci 1 


0.89 

0.81 

1.00 


The  1PL  and  3PL  ability  estimates  from  the  paper  and 
pencil  test  had  a  correlation  of  0.95.  Both  distributions 
were  leptokurtic  (kurtosis  =  3.48  for  the  1PL  estimates, 

4.39  for  the  3PL  estimates),  and  the  two  distributions  had 
similar  means  and  standard  deviations.  The  only  real 
difference  between  these  two  distributions  was  that  the  3PL 
distribution  was  significantly  negatively  skewed  (skewness  = 
-0.35),  while  the  1PL  distribution  was  significantly 
positively  skewed  (skewness  =  0.74). 

The  two  sets  of  tailored  test  ability  estimates  were 
fairly  similar  to  the  paper  and  pencil  test  ability 
estimates.  The  two  sets  of  1PL  estimates  had  a  correlation 
of  0.89,  and  the  two  sets  of  3PL  estimates  had  a  correlation 
of  0.86.  A  comparison  of  these  two  correlations  via 
Fisher's  r  to  z  transformation  yields  a  z  =  2.20,  p  <  .05, 
indicating  that  the  1PL  correlation  was  significantly  higher 
than  the  3PL  correlation.  Interestingly,  the  3PL  tailored 
test  ability  estimates  had  a  correlation  with  the  3PL  paper 
and  pencil  test  estimates  which  was  not  significantly 
different  from  the  correlation  between  the  1PL  tailored  test 
ability  estimates  and  the  3PL  paper  and  pencil  test  ability 
estimates  (r  =  0.86  for  the  3PL  estimates,  0.87  for  the  1PL 
estimates).  The  1PL  tailored  test  ability  estimates  did 
have  a  significantly  higher  correlation  with  the  1PL  paper 
and  pencil  test  estimates  than  did  the  3PL  tailored  test 
ability  estimates  (r  =  0.89  versus  r  =  0.81). 


Average  Test  Length  The  average  test  length  for  the  1PL 
tailored  tests  was  12.8  items,  while  the  average  3PL 
tailored  test  was  12.2  items  long.  This  difference  is  of 
little  or  no  practical  importance,  except  as  an  indication 
that  the  attempt  to  produce  tests  of  equal  length  for  the 
two  models  was  successful.  Of  some  importance  is  the 
finding  that  the  1PL  tailored  tests  required  approximately 
one  half  of  the  CPU  time  required  by  the  3PL  procedures.  Of 
course,  if  this  difference  had  no  signicant  impact  on 
response  time,  then  it  also  is  of  no  practical  significance. 

Nonconvergence  For  the  1PL  procedure  there  was  no 
nonconvergence.  For  the  3PL  procedure,  however,  there  was  a 
4.9%  nonconvergence  rate.  Examinees  for  whom  there  was 
nonconvergence  were  assigned  an  ability  estimate  of  4.0  or 
-4.0.  Of  those  cases  where  there  was  nonconvergence,  96% 
were  at  the  low  end  of  ability.  This  is  consistent  with  the 
finding  that  the  3PL  test  information  curve  was  negatively 
skewed  and  shifted  toward  the  positive  end.  Nonconvergence 
here  means  that  the  tailored  testing  procedure  vas  not  able 
to  compute  an  ability  estimate  for  an  examinee.  This  could 


happen  because  the  examinee  answered  all  the  items 
correctly,  or  all  the  items  incorrectly.  It  could  also 
happen  if  the  examinee's  ability  estimate  drifted  out  of  the 
range  for  which  there  were  appropriate  items  before  both  an 
incorrect  and  a  correct  response  were  obtained.  In  such  a 
case,  the  test  would  be  terminated  at  20  items,  or  when  both 
a  correct  and  an  incorrect  answer  were  obtained. 

Simulation  Data  Analyses 

Item  Pool  Calibration  The  first  step  in  the  simulation 
data  stage  of  this  study  was  the  generation  of  data  to  fit 
the  3PL  model.  The  true  item  parameters  used  for  these  data 
were  the  3PL  item  parameter  estimates  obtained  for  the  real 
data  used  in  the  first  part  of  the  study.  Data  were 
generated  for  3000  cases,  using  true  ability  parameters 
randomly  selected  from  the  standard  normal  distribution. 

Once  these  data  were  generated,  the  items  were  calibrated 
for  both  the  1PL  and  3PL  models  using  the  first  2000  cases. 
The  distributions  of  the  obtained  item  parameter  estimates 
are  shown  in  Figure  4.  These  distributions  are  summarized 
by  the  statistics  shown  in  Table  4. 

Table  4 

Descriptive  Statistics  of  Item  Parameter 
Estimates  for  the  Simulation  Data 


Statistic 


Mean 

0.00 

Median 

0.16 

S.D. 

0.90 

Skewness 

-0.31 

Kurtosis 

0.38 

Low  Value 

-2.20 

High  Value 

2.00 

With  few  exceptions,  these  distributions  are  very  much 
like  the  distributions  of  the  item  parameter  estimates 
obtained  for  the  real  data.  The  only  real  differences  were 
in  the  skewness  of  the  3PL  model  a-values,  which  went  from 
slightly  positively  skewed  to  not  significantly  skewed,  and 
the  kurtosis  of  the  b-values  for  the  3PL  model,  which  had  an 
increased  kurtosis  for  the  simulation  data. 


One  other  important  difference  that  was  found  was  that  for 
the  1PL  calibration  the  items  were  assigned  in  a-value  of 
0.60.  Since  this  was  higher  than  the  value  for  the  real 
data  (0.56),  it  was  expected  that  the  test  information  curve 
for  the  1PL  model  would  be  higher  for  the  simulation  data 
than  for  the  real  data.  It  was  unclear  what  effect  this 
would  have  on  the  simulated  1PL  tailored  tests,  except  that 
it  would  probably  increase  the  average  test  length. 

Table  5  shows  the  intercorrelation  matrix  for  the  true 
and  estimated  item  parameters  for  the  simulation  data.  As 
can  be  seen,  the  3PL  estimates  were  quite  similar  to  the 
true  parameters.  The  correlations  of  the  true  and  estimated 
3PL  item  parameters  were  0.89  for  the  a-values,  0.99  for  the 
b-values,  and  0.92  for  the  c-values.  The  correlation  of  the 
1PL  b-values  with  the  true  b-values  was  0.88,  and  the 
correlation  of  the  1PL  and  3PL  b-value  estimates  was  0.88. 

Table  5 

Intecorrelation  Matrix  for  the  True  and  Estimated 
Item  Parameters  for  the  Simulation  Data 


Item  True  1PL  Estimates  3PL  Estimates 


Parameter  a  b  c  b  a  b  c 


True 

a  1.00  0.25 

0.10 

0.45 

0.89 

0.21 

-0.09 

b  1.00 

0.40 

0.88 

0.27 

0.99 

0.29 

c 

1.00 

0.11 

0.19 

0.34 

0.92 

1PL 

b 

1.00 

0.41 

0.88 

-0.04 

3PL 

a 

1.00 

0.23 

0.08 

b 

1.00 

0.26 

c 

1.00 

Figures  5  and  6  show 

the  test 

information  curves  for 

the 

1PL  and  3PL  item  parameter  estimates,  respectively.  As  was 
the  case  with  the  real  data,  the  3PL  information  curve  is 
shifted  toward  the  positive  end  of  the  ability  scale.  It  is 
centered  around  .8.  The  1PL  curve,  on  the  other  hand,  is 


centered  around  0.0.  The  1PL  pool  once  again  appears  to  be 
appropriate  for  a  wider  range  of  ability  than  the  3PL  pool 
is,  especially  at  the  lower  end  of  the  ability  scale.  As 
was  predicted  from  the  item  calibration  results,  the  1PL 
test  information  curve  was  higher  for  the  simulation  data 
than  for  the  real  data.  An  unexpected  result  was  that  the 
3PL  test  information  curve  was  also  higher  for  the 
simulation  data  than  for  the  real  data.  This  was  probably 
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a  result  of  the  fact  that  the  simulation  data  were  generated 
to  fit  the  3PL  model. 

Ability  Estimates  Four  sets  of  ability  estimates  were 
once  again  computed  for  the  1000  examinees  not  included  in 
the  calibration  sample.  For  each  simulated  examinee  1PL  and 
3PL  ability  estimates  were  obtained  from  the  simulated 
tailored  tests  t ~  we\l  as  from  LOGIST  runs  on  the  simulated 
40  item  fixed  lei. ith  test  us  ng  the  item  parameter  estimates 
from  the  calibration  of  the  simulation  data.  Thus,  all  the 
comparisons  made  with  the  real  data  results  could  be  made 
with  the  simulation  data  results.  Because  these  were 
simulation  data  and  the  true  ability  parameters  were  known, 
the  ability  estimates  obtained  for  these  data  could  also  be 
compared  to  the  true  abilities. 

The  statistics  shown  in  Table  6  summarize  the  true 
ability  parameter  distribution,  as  well  as  all  of  the 
ability  estimate  distributions  obtained  using  the  simulation 
data.  Table  7  shows  the  intercorrelation  matrix  for  the 
true  and  estimated  abilities  for  the  simulation  data.  The 
patterns  appearing  in  these  data  are  much  like  those  found 
for  the  real  data.  For  these  data  the  correlations  are  all 
higher  than  for  the  real  data,  however,  with  the  exception 
of  the  correlation  between  the  1PL  and  3PL  (simulated)  paper 
and  pencil  test  ability  estimates,  which  was  lower  for  the 
simulation  data  (0.928  versus  0.946  for  the  real  data).  The 
1PL  tailored  test  ability  estimates  had  a  correlation  of 
0.931  with  the  1PL  simulated  paper  and  pencil  test 
estimates,  which  was  significantly  higher  than  the 
•correlation  of  0.826  obtained  between  the  3PL  tailored  test 
estimates  and  the  1PL  paper  and  pencil  test  estimates  (  z  = 
10.954,  p  <  .01).  The  1PL  and  3PL  tailored  test  estimates 
had  correlations  of  0.920  and  0.854,  respectively,  with  the 
3PL  paper  and  pencil  test  estimates.  The  difference  between 
these  two  correlations  is  significant  (  z  =  7.113,  p  <  .01), 
indicating  that  the  1PL  correlation  was  significantly 
greater  than  the  3PL  correlation. 

The  inclusion  of  the  true  ability  parameters  in  the 
analyses  of  the  simulation  data  resulted  in  a  very 
interesting  finding.  While  the  1PL  and  3PL  paper  and  pencil 
test  estimates  had  correlations  with  the  true  parameters 
that  were  not  significantly  different  (0.894  for  the  3PL 
estimates,  0.900  for  the  1PL  estimates),  the  correlation  of 
the  1PL  tailored  test  ability  estimates  with  the  true 
abilities  was  significantly  higher  than  the  correlation  of 
the  3PL  tailored  tests  ability  estimates  with  the  true 
abilities  (r  =  .883  for  the  1PL  estimates,  0.816  for  the  3PL 
estimates;  z  =  5.452,  p  <  .01).  This  was  rather  surprising 


since  the  simulation  data  were  generated  to  fit  the  3PL 
model.  Just  as  surprising  was  the  finding  that  the  1PL 
tailored  test  ability  estimates  had  a  correlation  with  the 
true  abilities  that  was  not  significantly  less  than  the 
correlations  between  the  true  abilities  and  the  paper  and 
pencil  test  estimates,  despite  the  fact  that  the  maximum 
length  of  the  tailored  tests  was  only  half  the  length  of  the 
paper  and  pencil  tests. 

Table  6 

Descriptive  Statistics  of  True  and  Estimated  Abilities 
for  the  Simulation  Data 


Tailored  Tests  Paper  and  Pencil  Tests 


Statistic 

True 

1PL 

3PL 

1PL 

3PL 

Mean 

-0.01 

-0.08 

-0.25 

0.02 

-0.09 

Median 

0.00 

-0.07 

0.00 

-0.10 

0.03 

S.D. 

1.04 

1.30 

1.48 

1.11 

1.22 

Skewness 

-0.01 

0.32 

-0.58 

1.11 

-0.24 

Kurtosis 

0.14 

0.86 

1.52 

4.27 

4.04 

Low  Value 

-3.82 

-3.61 

-5.58 

-2.47 

-4.00 

High  Value 
Mean  Test 

3.74 

6.22 

6.42 

4.00 

4.00 

Length 

S.D.  of  Test 

17.90 

13.51 

40.00 

40.00 

Length 

4.05 

5.77 

0.00 

0.00 

Note .  For  the  LOGIST  calibrations  arbitrary  minimums  and 
maximums  of  -4.00  and  4.00,  respectively,  were  set  on  the 
ability  estimates.  The  same  limits  were  placed  on  the 
tailored  tests  except  in  those  cases  where  all  items  were 
answered  correctly  or  all  were  answered  incorrectly. 

Table  7 

Intercorrelation  Matrix  for  True  and  Estimated  Abilities 

for  the  Simulation  Data 

Ability  Tailored  Tests  Paper  and  Pencil  Tests 

True  - 

Estimate  1PL  3PL  1PL  3PL 


Average  Test  Length  The  average  test  length  of  the  3PL 
tailored  tests  for  the  simulation  data  was  13.5  items.  The 
average  1PL  tailored  test  was  17.9  items  long.  Both  of 
these  averages  were  greater  for  the  simulation  data  than  for 
the  real  data  as  was  predicted  from  the  results  of  the  test 
information  curve  analyses.  The  average  3PL  test  increased 
by  1.3  while  the  average  1PL  test  increased  by  5.1.  The 
increased  length  of  the  1PL  tests  for  the  simulation  data 
could  at  least  partially  explain  why  the  1PL  tailored  test 
estimates  had  higher  correlations  with  the  true  abilities 
and  the  paper  and  pencil  test  estimates  than  the  3PL 
tailored  test  estimates  did.  Despite  the  longer  average 
length  of  the  1PL  tailored  test,  it  should  be  pointed  out 
that  the  3PL  procedure  required  half  again  as  much  CPU  time 
as  the  1PL  procedure. 

Nonconvergence  The  1PL  procedure  had  a  .3%  nonconvergence 
rate,  while  the  3PL  procedure  had  a  5.9%  nonconvergence 
rate.  For  the  1PL  procedure  all  of  the  nonconvergence  cases 
(three  of  them)  were  at  the  positive  end  of  the  ability 
scale.  For  the  3PL  procedure  90%  of  the  nonconvergence 
cases  were  at  the  low  end  of  the  ability  scale.  As  was  the 
case  with  the  real  data,  examinees  for  whom  there  was 
nonconvergence  were  assigned  an  ability  estimate  of  4.0  or 
-4.0. 


Discussion 

In  recent  years  a  number  of  studies  reported  in  the 
literature  have  addressed  the  issue  of  whether  the  1PL  model 
or  the  3PL  model  should  be  used  in  various  tailored  testing 
applications.  In  a  tailored  achievement  testing 
application,  the  application  of  interest  here,  the  research 
has  tended  to  favor  the  3PL  model.  Because  of  the 
inconclusiveness  of  these  studies  for  applications  involving 
small  item  pools,  and  because  the  3PL  model  tends  to  be  more 
expensive  to  use,  this  study  was  conducted  to  determine,  for 
a  specific  application,  whether  there  is  sufficient 
advantage  to  using  the  3PL  model  to  warrant  the  extra 
expense.  The  results  of  this  study  will  now  be  discussed, 
and  afterwards  some  conclusions  regarding  which  model  should 
be  used  for  this  application  will  be  presented.  First, 
however,  a  discussion  of  the  specific  application  which  is 
of  interest  in  this  study  will  be  presented. 

The  Application 


The  specific  application  of  interest  here  has  several 
characteristics  which  require  special  consideration.  The 
type  of  application  of  concern  is  an  achievement  testing 


application.  Achievement  testing  must  be  considered  in  a 
different  light  than  ability  testing  because  it  is  learning 
rather  than  ability  that  is  being  measured.  While  ability 
tests  generally  have  learning  components,  they  are 
constructed  to  measure  a  single  trait,  and  as  such  are 
usually  reasonably  unidimensional.  Achievement  tests,  on 
the  other  hand,  are  not  specifically  directed  at  a  single 
trait.  Moreover,  achievement  tests  often  are  designed  to 
measure  learning  in  a  number  of  content  areas.  Therefore, 
achievement  tests  typically  are  not  unidimensional,  and  are 
often  highly  multidimensional.  The  multidimensionality  of 
achievement  tests  causes  problems  for  IRT,  since  most  IRT 
models  assume  unidimensionality. 

One  way  to  deal  with  the  dimensionality  problem  when 
measuring  achievement  via  IRT  is  treat  the  different  content 
areas  separately.  Individual  content  areas  typically  are 
not  unidimensional,  but  they  at  least  afford  a  closer 
approximation  to  unidimensionalty  than  do  multi-content  area 
tests.  Treating  content  areas  separately  presents  a  new 
problem  for  tailored  testing.  A  single  content  area  of  a 
test  may  not  include  very  many  items.  Tailored  testing 
procedures  work  best  when  the  item  pool  has  a  relatively 
large  number  of  items,  with  difficulties  spread  uniformly 
over  the  ability  range  (Urry,  1977).  Building  an  item  pool 
to  meet  those  specifications,  but  using  only  items  from  a 
single  contant  area  might  be  difficult,  and  certainly  would 
be  time-consuming.  It  seems  likely,  then,  that  at  least  in 
the  early  stages  of  a  tailored  achievement  testing  program 
that  treats  content  areas  separately  the  item  pools  will  be 
small . 

There  are  at  least  two  other  ways  to  deal  with  the 
multidimensionality  of  achievement  tests  in  a  tailored 
testing  application,  but  at  this  point  neither  way  is 
practicable.  One  way  would  be  to  sort  the  test  items  into 
unidimensional  subsets,  and  treat  these  subsets  separately. 
However,  thus  far  there  are  no  satisfactory  procedures  for 
sorting  items  into  unidimensional  subsets  when  the  items  are 
dichotomously  scored,  which  achievement  test  items  typically 
are  (Reckase,  1981).  Even  if  sorting  could  be  done,  the 
problem  of  insufficient  items  in  the  pool  would  still  be 
present . 

The  other  way  of  dealing  with  the  multidimensionality 
problem  is  by  using  a  multidimensional  model. 

Unfortunately,  no  one  has  yet  developed  tailored  testing 
procedures  for  a  multidimensional  model.  Therefore,  this 
study  took  the  approach  of  using  a  unidimensional  model  with 
individual  content  areas.  The  content  area  used  was  the 


math  subtest  of  the  ACT  Assessment  Program.  Using  these 
items,  a  pool  of  40  items  was  constructed.  Using  this  40 
item  pool,  a  comparison  of  the  1PL  and  3PL  models  was 
conducted.  The  results  of  that  comparison  will  now  be 
discussed,  beginning  with  the  real  data  part  of  the  study. 

Real  Data  Analyses 

Item  Pool  Calibration  Probably  the  most  significant 
result  from  the  item  calibrations  was  the  finding  that  the 
3PL  item  parameter  estimates  yielded  a  test  information 
curve  that  was  negatively  skewed  and  centered  around  a  point 
on  the  positive  end  of  the  ability  scale,  while  the  1PL  item 
parameter  estimates  yielded  a  test  information  curve  that 
was  symmetric  and  centered  around  zero.  From  these  results 
it  would  be  expected  that  the  3PL  tailored  tests  would  tend 
to  terminate  prior  to  convergence  for  examinees  with  ability 
on  the  lower  end  of  the  scale.  Such  a  tendency  would  not  be 
expected  for  the  1PL  tailored  tests. 

Ability  Estimates  The  most  important  finding  from  the 
analyses  performed  on  the  ability  estimates  obtained  for  the 
real  data  was  that  the  1PL  model  performed  as  well  as  the 
3PL  model  without  requiring  any  additional  items.  The 
correlation  between  the  1PL  and  3PL  tailored  testing  ability 
estimates  was  fairly  high  (0.772),  and  the  1PL  tailored  test 
estimates  were  just  as  highly  correlated  with  the  paper  and 
pencil  test  estimates  as  were  the  3PL  tailored  test 
estimates.  From  these  data  it  appears  that  there  is  no 
advantage  to  be  gained  from  using  the  more  complex  (and 
expensive)  3PL  model. 


Average  Test  Length  For  the  real  data  tailored  test 
simulations,  the  average  test  length  for  the  1PL  and  3PL 
tests  were  about  the  same.  This  is  as  it  should  be,  since 
the  information  cutoff  values  for  the  two  procedures  were 
selected  to  produce  tests  of  equal  length. 

Nonconvergence  There  were  no  cases  of  nonconvergence  for 
the  1PL  tailored  test  procedure.  For  the  3PL  procedure 
there  was  a  4.9%  nonconvergence  rate.  Of  those  cases  where 
there  was  nonconvergence,  96%  involved  examinees  at  the  low 
end  of  the  ability  range.  This  is  consistent  with  the 
finding  that  the  3PL  test  information  curve  for  the  item 
pool  was  negatively  skewed.  Clearly  nonconvergence  is  more 
of  a  problem  in  this  case  for  the  3PL  procedure  than  for  the 
1PL  procedure. 


Simulation  Data  Analvses 


Item  Pool  Calibration  What  turned  out  to  be  one  of  the 
most  important  results  of  the  item  calibrations  was  that  for 
the  1PL  calibration  LOGIST  assigned  to  the  items  a  common  a- 
value  which  was  higher  than  that  assigned  to  the  items  using 
the  real  data.  This  resulted  in  higher  test  information  for 
the  1PL  model  across  the  ability  range.  As  a  result  of 
this,  the  information  cutoff  for  the  1PL  procedure  was 
inappropriately  low,  which  resulted  in  the  tests  being 
longer  than  expected.  The  test  information  curve  for  the 
3PL  model  was  also  somewhat  higher  than  for  the  real  data, 
except  at  the  extremes.  This  would  also  be  expected  to 
increase  the  average  test  length  of  the  3PL  tests,  but  not 
as  much  as  for  the  1PL  tests.  The  3PL  curve  was  negatively 
skewed,  as  was  the  case  with  the  real  data,  which  should 
have  once  again  resulted  in  some  nonconvergence  cases  at  the 
low  end  of  the  ability  scale. 

Average  Test  Length  As  was  expected,  the  average  test 
length  increased  for  both  procedures.  The  3PL  average  test 
length  increased  by  a  little  over  one  item,  while  the 
average  test  length  for  the  1PL  procedure  increased  by  about 
five  items.  There  is  no  reason  to  assume  that  the  quality 
of  the  1PL  ability  estimates  would  have  dramatically 
decreased  had  the  1PL  tests  been  shortened  by  several  items, 
although  it  would  probably  have  been  lower. 

Nonconvergence  For  the  simulation  data  the  3PL 
nonconvergence  rate  increased  to  5.9%,  while  the  1PL 
procedure  had  a  .3%  nonconvergence  rate.  Once  again, 
nonconvergence  is  clearly  a  more  serious  problem  for  the  3PL 
procedure  than  for  the  1PL  procedure.  As  was  the  case  for 
the  real  data,  the  bulk  of  the  nonconvergence  cases  for  the 
3PL  procedure  (90%)  were  at  the  low  end  of  ability.  This  is 
consistent  with  the  results  of  the  test  information  curve 
analyses  for  the  simulation  data  item  pools. 

Summary  and  Cone lusions 

A  study  was  conducted  to  compare  the  1PL  and  3PL  models 
in  tailored  achievement  testing  application.  Both  real  and 
simulation  data  were  employed.  For  the  real  data,  the  1PL 
procedure  was  found  to  yield  ability  estimates  that 
correlated  with  paper  and  pencil  test  estimates  as  highly  as 
did  the  3PL  tailored  test  ability  estimates.  The  1PL  tests 
were  of  about  the  same  average  length  as  were  the  3PL  tests. 
For  the  simulation  data,  an  inappropriately  low  information 
cutoff  was  used  for  the  1PL  procedure,  and  as  a  result  of 
the  1PL  tests  were  on  the  average  four  to  five  items  longer 


than  the  3PL  tests.  The  1PL  ability  estimates  were  found  to 
be  significantly  more  highly  correlated  with  paper  and 
pencil  test  estimates  than  were  the  3PL  estimates.  It  was 
unclear  what  the  results  would  have  been  had  the  1PL  tests 
been  terminated  earlier. 

The  1PL  model  is  a  more  appealing  model  than  the  3PL 
model,  since  it  is  simpler  to  work  with,  requires  smaller 
sample  sizes,  and  is  overall  much  less  expensive  to  use  than 
the  3PL  model.  The  results  of  this  study  indicate  that  for 
this  type  of  high  quality,  small  item  pool,  there  is  no 
justification  for  the  added  expense  and  complexity  of  the 
3PL  model.  For  this  application,  the  1PL  model  was  found  to 
be  the  model  of  choice. 
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Nitloml  Institute  of  Eiucitlon 
120)  19  th  Stre;t  N'V 
W  isht  nigton ,  DO  20203 

1  Or.  Vnrn  W.  Urry 
P-;rsonn  ?l  R4I)  Cantor 
Offle*  of  Personnel  Min  m.-m  -nr 
190  )  F.  Stra  *t  >N 
>/ ishlni*ton.  00  2) '« 1  *> 

1  Mr.  Thin  is  A.  Warm 

U.  9.  Oust  Coiri  Institute 
P.  O.  Suhstitlon  1.3 
Otlihini  City.  OK  73169 

I  Dr.  Joseph  L.  Youn1’.  Director 
Mjnory  6  CoRnltlvi-  Pro.:.*ss  *s 
Nitlonil  Se  Ion- .•  Fouiiitlon 
WishlnRton,  DO  20690 


Pr  t  vat  .'  Kir  tor 

1  Dr.  J.nci  AlRlni 

University  of  Florlii 
Oil  nos-/!  1 1<* ,  FL  126, 

l  Or.  Erllni*  R.  VH-»rs',n 
D-»o  irtm-nt  of  Statistics 
Stui  l f»str  ti'l"  6 
1469  Con  ,n*»it''‘i 
DF.N'IAIK 

1  1  Psvrhiloi' ieil  R  •«s“  irrh  Milt 
MR’I-  1-4'*  Attn?  Lth'-'irlin 
9ortl»S  >iirrv*  II  >«ih  • 

Turn  >r  ACT  2601 
AUSTRALIA 

l  Or .  I  s  •  ir  l)  >  1  ir 

F.liir  irlonil  T.'Stlm*  StvI  r.» 
Princeton,  Ml  03460 

I  Or.  M-*n>i''h  i  Hlr.*nh  mu 
S.'lml  of  Kin*it  loo 
T**l  Aviv  Uilv*rsltv 
To  l  Aviv,  Imr  Aviv  60973 
Israel 

I  Dr.  R.  I)  irrol  I  Rick. 

D'pirtwiu  of  F.  Ini' ir  I  on 
University  of  Chlc'ir.o 
Ohl'io.o.  1L  69637 

1  Dr.  Robert  Rr-nnio 

American  Colley-1  Testing  Pro(*rms 

P.  0.  Box  I6H 

fowl  City.  IA  622  4  1 

I  Dr.  Ernest  R.  Ciiott'* 

10  7  Sro'<'*ly 

University  of  Tennessee 
Knoxville,  TV  17*116 

I  Dr.  John  B.  Cirrol I 
409  El  1 lott  Ri . 

Chipel  Hill.  N2  27614 

l  Dr.  Norm  in  Cliff 
D’pt.  of  Psy-Solo.'.y 
Unlv.  of  So.  Ciltfornl.i 
University  P*rk 
Los  AnReles,  C\  9)11/ 


Private  Sector 

l  Dr.  llins  Cro-abiq 

Eluent  Ion  Research  Center 
University  of  Leyden 
R  >  •r'n.w'l'Un  2 
2m  EM  Loyd-n 
Th*  NK  niERLAMDS 

I  Dr.  D  it  C  pr.id  i  i  IHvr'l 
Syr  irn<;i>  University 
D  -p  irtm’nt.  of  Psychology 
Syr  i''os  • .  ME  1 12  1 0 


I  Dr.  Frit  r.  Dr.isi'ov 

Department  of  Psvchiloqv 
University  of  Illinois 
b.)l  E.  Dm  I  "I  St. 

Gh impel qn,  II.  f»IA?U 

I  Dr.  Sisin  Emh-rtson 
PSYCH  >L>OY  DSPART1EMT 
IIM I  VERS  ITY  OF  <\M>\S 
I,  iwr-'ni’o  .  KS  h(>')\ 5 

I  ERIE  F  tell  Ity- Acquisitions 
AS  1  1  Riu'by  Avenuo 
H-thesd  i .  MU  20  1 1 4 

I  Dr.  R  ‘iijiiiln  A.  F.ilrhmk.  Jr. 
MeFinn-Griy  A  Assoclitos,  Inc. 

5S1S  Citliqhan 
Suite?  225 

Sin  Antonio,  TX  7S7 2« 

1  Dr .  L  ‘On  iri  Feldt 

Lindquist  Center  for  M*isurm?nt 
University  of  tovi 
lowi  City,  IA  522A2 

I  Dr.  Rlchirl  L.  Ferquson 

The  American  Col  l '•*».»  Testing  Proqr  im 

P.0.  Dos  IbH 

lowi  City.  IA  522AO 


Prlvit*  S>ctor 

I  Dr.  Dexter  Fletcher 

WICAT  Research  Institute 
IS75  S.  State  St. 

Drem.  Ur  22 '1 1 

I  Dr.  J  mice  0|  f  ford 

University  of  Miss  oeh  isetts 
School  of  F.due-itlon 
•Vnhers*’  .  ‘IA  OIOU2 

I  Dr.  Koh  *rt  G1 user 

l.eirntn*  H-' >» o r f- It  A  U'volopvn'  C-n'"r 
University  of  Pittsburgh 
VI  10  )*i|  ir.i  St  r-"*t 
prrrs'iuwi.  p\  rmo 

I  Dr.  Her t  Green 

lohns  Mopklns  University 
l)ep  irtmen*  of  Psychology 
Co  tries  A  llt'i  Street 
Dll  1 1 store,  *1D  ">121  H 

I  Dr.  Ron  Hmbletoo 
School  of  F.du-.nt  Ion 
University  of  M  iss echos >t ts 
Amherst  .  MA  DI0U2 

I  Dr  .  D  >  I  wyn  Unrnl  sell 
University  of  Illinois 
2A2b  Education 
Urbin  i ,  !L  MRII 

I  Dr.  Paul  Horst 
f) 77  G  Stre  -t  ,  It  i  S'* 

Chula  Vlst.n,  GA  9UD10 

1  Dr.  Lloyl  Humphreys 

D’pirtm’nt  of  Psyeholoqy 
University  of  Illinois 
bDd  East  Daniel  Street 
ChiitpalRn,  IL  5IH20 


I  Unlv.  Prof.  Dr.  Gerhard  Fischer 
Llebl qq  ass"  s/I 
A  ID |()  Vlenni 
AUSTRIA 

I  Professor  Danild  Fltri»erild 
University  of  N*w  Enql.enl 
ArmHile,  N’W  South  Miles  2  551 
AUSTRALIA 


1  0c.  Jock  II  inter 
2122  Cool  ld«*e  St. 

Lanslnq,  Ml  AS9DA 

I  Or.  Uuyob  Muvnh 

Golloqe  of  Education 
University  of  South  Cirollm 
Columbia.  SC  292DS 


Private  Sector 


Private  Sector 


I  Or.  Douglas  H.  Jones 

Advanced  Statistic-*!  Technologies 
Corpor.it ion 
10  Trafalgar  Court 

Lawrencevt l le ,  Nl  0S14S 

I  Professor  Jotin  A.  Kelts 
Oep ircmnc  of  Psychology 
The  University  of  New~*stle 
M.S.VI.  2IH 
AUSTRALIA 

I  Or.  Willi  is  Koch 

University  of  T "t  as-Aust In 
M  »asure.m  "tat  and  Evaluation  C-’ot-r 
Austin.  TX  /S701 

I  nr.  Alan  l.esRold 
Learning  RAU  Center 
Unlv»rslty  of  Pittsburgh 
1919  O' i|ar  *  Street 
Plttsburr.il.  PA  IS2e9 

I  Or.  filch  le l  L^vln-* 

Department  of  Educational  Psychology 
210  Sdu-atlon  Bill*. 

University  of  Illinois 
Champ  it^n,  IL  AIS9I 

I  Or.  Ch tries  Lewis 

Facultelt  Soetnle  W  ‘tonschappen 
Rl Jksun I  vers  I  tel C  Gronln^  -n 
Ouie  Boter ln«v*str ait  21 
97120C  CronlnRen 
Netherlands 

l  l)r.  Robert  Linn 

College  of  Education 
University  of  Illinois 
Urban*.  IL  AIS01 

I  Mr.  Phillip  Livingston 

Systems  ani  Apolierj  Sciences  Corparatlo 
6*1 1  1  Kenilworth  Avenir* 

Rl  ver  l.it  e.  MU  20RA0 

l  Or.  Robert  Lockman 

Center  for  Nival  Anitysls 
2'K)  North  B'lureR.ari  St. 

Alexandria,  VA  22Hi 


l  Ur.  Frederic  M.  Lori 

Educational  Testing  Service 
Princeton,  Nl  0SS41 

1  nr.  Jam's  Lumsien 

Department  of  Psyrholoey 
University  of  Western  Australia 
N  id l  inds  W.  A.  A009 
AUSTRALIA 

I  Dr.  Gary  Marco 
Stop  11-E 

Education  il  Test.  I  n<*.  S^rvl-e 
Princeton,  Nl  ’>V»SI 

I  Dr.  S-oct  M ixw  •  I  I 

n*pirtm*nf  of  Psychology 
University  of  Notre  n  me 
Notre  name,  in  4ASSA 

I  hr.  Samuel  T.  Miyo 

Loyoli  University  of  CilcnRo 
S20  North  MIchlR  in  Av-*nu  ■ 

Chicago.  II.  AOS  I  l 

I  Mr.  Rob *rt  McKinley 

American  College  Testing  Programs 

P.O.  Box  I6R 

Iowa  City.  IA  S2?Al 

I  nr.  Birhar a  M 'ins 

'Inman  R  isources  Research  Otr  miration 
V) )  North  WishlnRtoo 
Alexandria,  VA  22114 

I  l)r.  Robert  Mlslovy 
7  11  III Inol 9  St  reet 
Geneva,  IL  AOldA 

I  Hr.  Allen  Monro 

Behavioral  Technology  Laboratories 
IRAS  Elena  Ave.  ,  Fourth  Floor 
R-HonJo  Beach.  C\  90277 

I  Dr.  W.  Alan  Nlc*w*nder 
University  of  Oklahoma 
U'pirtment  of  Psychology 
Okl.ihomi  City.  OK  710A9 

l  Or.  M'lvln  R.  Novlck 

ISA  Ltniqulst  Center  for  M»asurm»nt 
University  of  Iowa 
Iowa  City,  IA  S22A2 


•■y.y.y.:* 


V.  .\  . 


Private  Sector 


Privite  Sector 


1  Or.  Jimos  Olson 
WICAT.  Inc. 

1875  South  St  ito  Street 
Orem.  Ur  8405/ 

l  '4  lynn  M.  Pittance 

Auer ic  in  Connell  on  Educitlon 
CEI)  Testing  Service,  Sift*’  21 
On-'  Dupont  Olrle,  NV 
W  »s!il  n't  too  .  I)”  20  VJ*> 

I  Or.  J  im -s  A.  P tnl son 

Port  l  in  I  St  it'*  Uo  I  vers  1  ty 

P.O.  Box  751 

Port  l  und  ,  OH  4  7  2  17 

1  Or.  Mirk  0.  R’ckis; 

act 

P.  O.  Box  168 
1 own  0 1 r  y ,  I A  52  2  * l 

I  Or.  Pioim  Reynolds 

University  of  Tox.is-9  1 1  I  is 
Mirk  Mini*  I)  •pirtm-’ot 
P.  O.  Box  688 
HI  ch  irdsnn ,  TX  75080 

I  Or.  L  iwron.'"  Rulnor 
50  l  Kim  Avenu  • 

Tikomn  Pirk.  *11)  20012 

1  Or.  J.  Ryin 

Deportment  of  Education 
University  of  South  Cirollni 
Columbia,  SC  29208 


l  Lowell  Scho’r 

PsyrhnloRlc.il  &  Omntl  titlvp 
Found  it  1  ons 
College  of  Education 
University  of  lowi 
lowi  Cl ty.  U  5? 252 

I  DR.  ROB CRT  J.  SEIDEL 

INSTRUCTIONAL  TECHNOLOGY  GROUP 
IIU  1RR0 

BIO  'I.  WASH  I  no  TON  ST. 

ALEX ANDR  1  A  .  VA  22  115 

1  Or.  K lzun  ShiR  iso 
University  of  Toll  oku 

0  *p  irtm'n*.  of  Fine  itlonil  Psychology 

Kawauchl  .  S-nd  it  980 

JAPAN 

1  Or .  Kiwi n  Shi rk  >v 

0'pnrtn»‘ni  of  Psychology 
University  of  Cenf.nl  Florida 
Orlando.  FL  128  I  6 

I  Dr.  Wl  1  Him  Sims 

Center  for  Nival  Aoilysls 
200  North  H-»mire(»  ir8  Street 
Al ex indr I  i ,  V\  22  111 

I  Dr.  II.  Will  ic-  Sin  1 1  ko 
ProRr.im  Director 

'Imoiwr  Re*e  irrh  ml  Advisory  Services 
Smithsonian  Institution 
801  North  Pitt  Street 
Al  "x.inlr  1 1 ,  VA  22  115 


1  PROF.  FIIMIKO  SAME  J  IMA 
DEPT.  OF  PSYCH  ll.OGY 
UNIVERSITY  OF  TENNESSEE 
KNOXVILLE.  TN  17916 

I  Frink  L.  Schmidt 

D'pircm>nt  of  Psychology 
BMr.  GG 

C'»ors»e  WishlnRton  University 
WishlnRton.  DC  20052 


l  Dr.  Robtrt  St.ernb’rR 
Dept,  of  Psychology 
Yale  University 
Bnx  1 1  A.  Y  lie  Station 
N  *w  Hiven,  CT  06520 

l  Dr.  Peter  Stoloff 

Center  for  Niv.il  Aoilysls 
200  North  8 •aureRord  Street 
A l exin  1  r 1 1.  VA  22H  1 


1  Dr.  Witter  Schneider 
Psychology  D_*pirtm»nt 
601  E.  0  into l 
ChampilRn,  IL  61820 


1  Dr.  Wl ll lit  Stout 

University  of  111 inol s 
D'oirtm’nr  of  Mithemtlcs 
Urh  in  i.  IL  61 801 


Prlvata*  Sector 


Private  Sector 


1  Or.  Hartharan  Swamlnathtn 

Laboratory  of  Psycho*  »trlc  on  I 
Evaluation  Research 
School  of  Education 
University  of  M ass achus ?tts 
\mh  -rst .  M\  01055 

l  Or.  Klku-nl  Titsmki 

Computer  Based  F,ducir  ion  R  *s-.arc!a  Lab 
252  En<»in'’orln<*  Research  Laboratory 
Urhini,  IL  51851 

l  Or.  Maurice  Tatsuok  i 
220  Edu-itlon  T 1  <1 1» 

1310  S.  Sixth  St. 

Chv»oiit?n.  IL  51820 

I  Or.  David  Thlssen 

l)*partment  of  Psycho  lo;*y 
University  of  <  ms  is 
Lawrence ,  KS  55044 

1  Dr.  Robert  Tsait-akawa 
Department  of  Statistics 
University  of  Missouri 
Colaambla.  MO  85201 

l  Or.  J.  Uhl an~r 

Uhl  inor  Consultants 
42*>t)  Booavlt.a  Drive 
Enclno .  C\  014)5 

I  Dr.  V.  R.  R.  Uppuluel 
Union  Carbide  Corporation 
Nuclear  Division 
P.  0.  Box  Y 
Oak  RldRo,  TN  57810 

1  Or.  David  Vale 

Assessm-nt  Systems  Corporation 
2255  University  Avcnu  • 

Suite  510 

St.  Paul,  MN  55114 

l  Dr.  H award  W  aln-*r 

Division  of  Psychological  Studies 
Educational  Testing  S»rvlc" 
Princeton.  Nl  08540 

I  Dr.  Michi“l  T.  Waller 

Oipartmcnt  of  Elucitlonal  Psychology 
University  of  Wlsr.onsin--Ml  Iwaiakee 
Milwaukee.  Wl  55201 


1  Dr.  Brian  Waters 
HuaaRRO 

50.)  North  Washington 
Alexandria,  V\  22514 

1  Or.  David  1.  V *iss 
N550  El  1  lott  Mall 
University  of  Minnesota 
75  E.  Rt  v*>r  Ro  a  l 
Minneapolis.  MN  55455 

1  Or.  R  an  I  R.  Wl  !  cox 

University  of  Southern  Callfo-nl  a 
0»partm',nf  of  Psy "ho I oey 
Los  Aaaa'eles.  C\  9500/ 

I  W.alft’ane,  Wi  I  lr»  rub  ■> 

Streltkra  *fte  aiat 
Box  20  50  5  1 
D-5  10  )  Ho. an  2 
WEST  GERMANY 

I  Or  .  Braio-*  Wl  1  I  1  nts 

O^partm-'nt  of  F,  laie.at  Iona  l  Psy'holo^v 
University  of  Illinois 
Urbana.  IL  51H  )1 

l  Dr.  Wualy  Yen 
CTH/M-Griw  Ml l I 
D*l  Mante  Research  Park 
Monterey,  C\  0  5940 
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